-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Bring throw helpers to PUSH_COOP_PINVOKE_FRAME plan #123015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Tagging subscribers to this area: @mangod9 |
1a243a6 to
00c4f6a
Compare
b8782d2 to
0d677dd
Compare
0d677dd to
5359b4c
Compare
| // the xmm registers are not supported by the libunwind | ||
| .endm | ||
|
|
||
| // Unaligned version for use when stack alignment cannot be guaranteed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prevents us from ensuring that the stack is sufficiently aligned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constraint is GetOffsetOfFloatArgumentRegisters() returns a fixed -128. TransitionBlock must point to ArgumentRegisters at rsp+136. So floats must be at rsp+136-128 = rsp+8. After call + 12 pushes + alloc_stack 136, rsp is 16-byte aligned, but rsp+8 is NOT 16-byte aligned. Without changing the TransitionBlock layout (which would require changes across many files), we can't achieve 16-byte alignment for float saves. Using unaligned stores for this exception throw path is acceptable since it's not performance critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand - the PROLOG_WITH_TRANSITION_BLOCK pushes the same set of registers and it has the floats at aligned locations. Actually, why cannot we use the PROLOG_WITH_TRANSITION_BLOCK to do the work here? It seems we are pushing exactly the same set of registers that the macro does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli, I tried PROLOG_WITH_TRANSITION_BLOCK and reverted it as it was failing linux-arm64, win-arm64 and linux-arm which were passing earlier. PROLOG_WITH_TRANSITION_BLOCK doesn't save FP callee-saved registers (d8-d15 on ARM64, d8-d15 on ARM32). Exception dispatch needs these values to correctly restore context during unwinding. We could add them after PROLOG_WITH_TRANSITION_BLOCK, but that adds complexity. The custom macro keeps everything self-contained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand where is the added complexity. Adding the callee saved FP registers seems additive to PROLOG_WITH_TRANSITION_BLOCK. Also, as for the alignment,
this is the stack layout on Windows, you can see we have a padding there to align the floats.
; r9
; r8
; rdx
; rcx <- __PWTB_ArgumentRegisters
; return address
; CalleeSavedRegisters::r15
; CalleeSavedRegisters::r14
; CalleeSavedRegisters::r13
; CalleeSavedRegisters::r12
; CalleeSavedRegisters::rbp
; CalleeSavedRegisters::rbx
; CalleeSavedRegisters::rsi
; CalleeSavedRegisters::rdi <- __PWTB_StackAlloc
; padding to align xmm save area
; xmm3
; xmm2
; xmm1
; xmm0 <- __PWTB_FloatArgumentRegisters
; extra locals + padding to qword align
Similar on Unix x64:
// return address
// CalleeSavedRegisters::rbp
// CalleeSavedRegisters::rbx
// CalleeSavedRegisters::r15
// CalleeSavedRegisters::r14
// CalleeSavedRegisters::r13
// CalleeSavedRegisters::r12
// ArgumentRegisters::r9
// ArgumentRegisters::r8
// ArgumentRegisters::rcx
// ArgumentRegisters::rdx
// ArgumentRegisters::rsi
// ArgumentRegisters::rdi <- __PWTB_StackAlloc, __PWTB_TransitionBlock
// padding to align xmm save area
// xmm7
// xmm6
// xmm5
// xmm4
// xmm3
// xmm2
// xmm1
// xmm0 <- __PWTB_FloatArgumentRegisters
// extra locals + padding to qword align
I've looked into it and the TransitionBlock::GetOffsetOfFloatArgumentRegisters is actually never called on x64 windows and the value it returns would be incorrect there. On x64 Unix, the function is called, but it seems that the result is not used for anything (verifying now).
So that would explain why the incorrect offset doesn't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was right, I've changed the GetNegSpace as follows and all coreclr tests on Linux x64 still pass:
static int GetNegSpaceSize()
{
LIMITED_METHOD_CONTRACT;
int negSpaceSize = 0;
#ifdef CALLDESCR_FPARGREGS
negSpaceSize += sizeof(FloatArgumentRegisters);
#endif
#if defined(TARGET_ARM) || defined(TARGET_AMD64)
negSpaceSize += TARGET_POINTER_SIZE; // padding to make FloatArgumentRegisters address properly aligned
#endif
return negSpaceSize;
}
src/coreclr/vm/jithelpers.cpp
Outdated
|
|
||
| // Set the last thrown object before dispatching the exception. | ||
| // This is required for exception handling code that checks LastThrownObject. | ||
| pThread->SafeSetLastThrownObject(oref); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right, I don't understand why this change would require adding this. The last thrown object is set in the EH code when the first pass starts. It is part of the Thread::SafeSetThrowables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am experimenting with it for win-x64 SEH thing. That's the only platform failing tests at the moment as something is corrupting MXCSR. I am now debugging it cdb at the moment, but since I'm not a windows expert, I'd need some help.
This change was just a "maybe that would fix win-x64", not something needed by any platform (including win-x64). Reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets see if 22571b2 fixes it. I spent the whole evening debugging with cdbx64.exe to get codegen tests pass. 😅
| // the xmm registers are not supported by the libunwind | ||
| .endm | ||
|
|
||
| // Unaligned version for use when stack alignment cannot be guaranteed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand - the PROLOG_WITH_TRANSITION_BLOCK pushes the same set of registers and it has the floats at aligned locations. Actually, why cannot we use the PROLOG_WITH_TRANSITION_BLOCK to do the work here? It seems we are pushing exactly the same set of registers that the macro does.
This reverts commit af084b8.
Added MXCSR reset to prevent floating point exceptions during JIT compilation on AMD64 Windows.
Fixes #116375.